A complete reference genome of broomcorn millet

Broomcorn millet (Panicum miliaceum L.), known for its traits of drought resistance, adaptability to poor soil, short growth period, and high photosynthetic efficiency as a C4 plant, represents one of the earliest domesticated crops globally. This study reports the telomere-to-telomere (T2T) gap-free reference genome for broomcorn millet (AJ8) using PacBio high-fidelity (HiFi) long reads, Oxford Nanopore long-read technologies and high-throughput chromosome conformation capture (Hi-C) sequencing data. The size of AJ8 genome was approximately 834.7 Mb, anchored onto 18 pseudo-chromosomes. Notably, 18 centromeres and 36 telomeres were obtained. The assembled genome showed high quality in terms of completeness (BUSCO score: 99.6%, QV: 61.7, LAI value: 20.4). In addition, 63,678 protein-coding genes and 433.8 Mb (~52.0%) repetitive sequences were identified. The complete reference genome for broomcorn millet provides a valuable resource for genetic studies and breeding of this important cereal crop.


Background & Summary
Broomcorn millet (Panicum miliaceum L.), a member of the Paniceae tribe in the Gramineae family, exhibits remarkable adaptability to marginal regions due to its short growing season (60-90 days), low water requirements, high salt tolerance, and efficient nutrient resource utilization 1,2 .Being a C 4 plant, broomcorn millet demonstrates enhanced carbon fixation and efficient utilization of water and nitrogen resources.Additionally, its grains are characterized by their gluten-free nature and exceptional nutritional value, containing higher protein content, mineral composition, and antioxidant levels compared to most other cereals 3 .Consequently, broomcorn millet has been extensively cultivated in semiarid regions across Asia, Europe, and other continents and is considered one of the oldest crops worldwide 4 .The cultivation of broomcorn millet holds promise for enhancing food security, diversifying agriculture, and promoting a healthier diet 5 .Broomcorn millet has an allotetraploid genome consisting of 36 (2n = 4x = 36) chromosomes 6 .Although four chromosome-level of broomcorn millet, Jinshu7 7 , LM_v1 8 , LM_v2 9 , and Pm_0390 8 , have been made available, there are still missing segments within the genome due to the presence of highly repetitive sequences clustered across the genome, particularly in the telomere and centromere regions.In recent years, T2T and gap-free genomes have been successfully obtained in various important crops, including rice 10 , barley 11 , and maize 12 .
In the present study, we assembled the first T2T gap-free genome of broomcorn millet (AJ8) (Fig. 1a), achieved through PacBio HiFi long reads, Nanopore technologies and Hi-C sequencing data.The resulting complete genome assembly has a final size of 834.7 Mb and is organized into 18 pseudo-chromosomes (Table 1; Fig. 1b).Gene annotation identified 52.0% repetitive sequences and 63,678 protein-coding genes (Fig. 1b).This complete reference genome provides a robust foundation for future studies on population and conservation genetics of broomcorn millet.

Methods
Plant materials and growth conditions.The broomcorn millet landrace sequenced in this study was originally collected from Center for Agricultural Genetic Resources Research, Shanxi Agricultural University, Taiyuan, Shanxi Province (coordinates: E 112° 34′ 26.66″, N 37° 46′ 37.16″).The plants were planted under controlled conditions with a temperature of 25 °C, humidity of 60%, and a light intensity following a 14-hour day and 10-hour night cycle.Twenty seedlings with consistent growth at the fourth leaf stage were carefully chosen and sampled from various organs, including roots, stems, and leaves.A weight of 2 g was measured for each tissue organ, which was immediately placed in a freezing chamber with liquid nitrogen and subsequently stored at −80 °C.
Long insert libraries preparation and sequencing.Genomic DNA was extracted from leaf tissue using DNeasy Plant Maxi kit (Qiagen).The PacBio long insert libraries were prepared according to manufacturers' instructions with an insert size of approximately 20 kb (Pacific Biosciences, USA).Subsequently, the libraries were subjected to sequencing using PacBio Sequel II platforms in circular consensus sequencing mode.The subreads were processed using SMRTLink (v11.1.0) 13with parameters "-minPasses 3 -minPredictedAccuracy 0.99 -min-Length 500", yielding approximately 77.0 Gb high-fidelity (HiFi) reads with a N50 size of about 18.0 kb (Table 2).

Item
This study (AJ8) Wang et al.The ONT ultra-long insert libraries were generated using the Oxford Nanopore SQK-LSK109 kit, and then sequenced on a PromethION flow cell (Oxford Nanopore Technologies, Oxford, UK).A total of 165.6 Gb of ONT data with 196x coverage was generated, and the N50 value was 55,765 bp (Table 2).After error correction and length filtering of the data, 34.8 Gb ultra long ONT reads with the N50 value 92, 975 bp were obtained (Table 2).

Short insert libraries preparation and sequencing.
For chromosomal conformational capture (Hi-C) sequencing, Hi-C libraries based on DpnII restriction enzymes were generated as previously described 14 , and sequenced on the MGISEQ-2000 platform.A total of 253.0 Gb of clean data were obtained from 255.1 Gb of sequencing data using software SOAPnuke (v2.0) 15 with parameters "-n 0.01 -l 20 -q 0.1 -i -Q 2 -G 2 -M 2 -A 0.5" (Table 2).
RNA-seq libraries from leaf tissues were constructed using the NEBNext ® Ultra ™ RNA Library Prep Kit for Illumina ® (NEB, Ipswich, MA, USA) following the manufacturer's protocol.Then the RNA libraries were sequenced on a MGISEQ-2000 instrument and generated 150 bp paired-end reads.After quality control by fastp (0.19.5) 16 with parameters of "-adapter_sequence AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA-adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG-average_qual 15 -l 150", each library contained more than 7.8 Gb of clean data.More than 98.1% of the clean data had scores greater than Q20 in each library (Table 3).

Genome assembly.
Using HiFi reads, ultra-long ONT reads, and Hi-C clean data, the primary contigs were assembled by Hifiasm (v 0.19.5) 17 with default parameters.To anchor contigs onto chromosomes, we used BWA (v 0.7.12) 18 to align the Hi-C clean data to the assembled contigs.Low-quality reads were filtered out using the HiC-Pro pipeline 19 with default parameters.The remaining valid reads were employed to anchor chromosomes with Juicer 20 and 3d-dna pipeline 21 .Excitingly, our results showed that the hifiasm assembly consists of contiguous sequences covering the entire length of all chromosomes.This achievement can be attributed to the remarkable  accuracy of HiFi data, the utilization of ultra-long ONT data, and the ongoing enhancements in assembly algorithms.Analogous to the T2T genome of rapeseed 22 , the hifiasm assembly comprises continuous sequences spanning the entirety of nine chromosomes.For further refinement of the genome, the T2T assembly was polished  Primary ID New ID after subgenome phaser Correspondence between chromosome identificationss of AJ8 and Jinshu7.# Consistency of AJ8 new ID with Jinshu7 Genome.using a similar method described by Mc Cartney et al. 23 .In brief, the HiFi reads were aligned to the T2T assembly using Winnowmap2 (v 2.03) 24 .The resulting alignments were filtered to exclude secondary alignments and alignments with excessive clipping by using 'falconc bam-filter-clipped' tool.Finally, racon (v 1.5.0) 25 was performed with the filtered alignments.The final assembled genome had a length of 834,678,208 bp and a contig N50 of 48.3 Mb (Table 1).The assembled sequences were successfully anchored to 18 pseudo-chromosomes (Table 1).The completeness of the assembly was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) (v 5.5.0) 26 with the embryophyta_odb10 (parameters: -m genome -l embryophyta_odb10).We appylied Merqury (v1.3) 27 using PacBio HiFi long reads with a K-mer value of 17-bp to estimate the quality value.

Gene expression analysis.
The gene expression analysis was used the same method as previously reported 49 .The expression heatmap was constructed using heatmap R package.The expression matrix of genes in different transcriptome samples was displayed in Table S2.

Synteny analysis.
The identification of syntenic regions was based on conducting homology searches using MCScan (Python version) 50 , with a minimum requirement of 30 genes per block.

Subgenome phasing.
By employing repetitive k-mers as "differential signatures" and utilizing the SubPhaser software, we successfully phased the subgenomes of AJ8.The results obtained from SubPhaser 51 were found to be consistent with the Jinshu7 genome, as indicated in Table 6 and Fig. 4.

Data records
The sequencing data and assembled genome sequence have been deposited in the Sequence Read Archive with accession numbers SRP482566 52 under project number PRJNA1059665.The genome assembly has been deposited at GenBank under the WGS accession GCA_038442765.1 53 .Files of the gene structure annotation, repeat predictions and gene functional annotation were deposited at Figshare 54 .

Technical Validation
Genome assembly and gene prediction quality assessment.The accuracy and integrity of AJ8 T2T assembly were assessed through several analyses.Firstly, the Hi-C heatmap displayed consistent results across all chromosomes, indicating the correct ordering and orientation of contigs in the assembly (Fig. 2a).Secondly, the assembly successfully captured 18 centromeres and all 36 telomeres, providing strong evidence for their integrity (Fig. 2b; Tables 7, 8).Thirdly, the assembly showed high collinearity with Jinshu7 (GCA_026771285.1) 7nd Panicum hallii (GCA_002211085.2) 55 (Fig. 2c).Fourthly, alignment results from minimap2 (v 2.24-r1122) 56 revealed that 100.0% of ONT reads and 99.98% of HiFi reads could be aligned to the AJ8 T2T assembly.Additionally, the average genome alignment rate of the transcriptome was 98.7% (Table 3).Lastly, AJ8 T2T demonstrated an LTR assembly index of 20.4, a quality value of 61.7, and a BUSCO score of 99.6%, indicating its high completeness (Table 1).
We compared the length distribution of genes among the AJ8, maize 12 , sorghum (GCF_000003195.3) 39, LM_1 8 , and Jinshu7 7 and found similar patterns (Fig. 3a).The BUSCO analysis showed that 99.5% (single-copy gene: 19.5%, duplicated gene: 80.0%) of 1,614 embryophyta single-copy orthologs were successfully identified as complete, while 0.2% were fragmented and 0.3% were missing in the assembly (Fig. 3b).62,885 (98.8%) gene models were successfully annotated in diverse databases and 51,958 gene models (81.6%) exhibited detectable transcriptional activity (FPKM value ≥ 1) across 21 RNA-seq samples (Fig. 3c; Table 5; Fig. 3d).Taken together, these results provide strong evidence that a high-quality AJ8 genome has been obtained.The high-quality genome provides a solid foundation for uncovering the drought resistance and adaptive mechanisms of AJ8, and also serves as an important reference for the rapid breeding of AJ8 and other crops.

Fig. 1
Fig. 1 An overview of the AJ8 genome.(a) The photograph of the AJ8 plant.(b) Circos plot illustrating the genome of the AJ8 genome.The plot includes the following components, arranged from inside to outside: (I) Collinear regions within the AJ8 assembly; (II) GC content in non-overlapping 1 Mb windows; (III) Percentage of repeats in 1-Mb sliding windows; (IV) Gene density in 1-Mb sliding windows; (V) Length of pseudochromosome in megabases (Mb).

Fig. 2
Fig. 2 The high-quality of the AJ8 genome.(a) Heatmap displaying Hi-C interactions of AJ8 pseudomolecules.(b) Telomere detection map.Triangles and circles represent telomeres and centromere within the AJ8 assembled chromosomes.The orange color represents regions with high gene density, while the dark sky blue color represents regions with low gene density.(c) Synteny analysis of Panicum hallii, AJ8 and Jinshu7.

Fig. 3
Fig. 3 Gene prediction quality assessment.(a) The composition of gene length in the AJ8 genome compared to the genomes of other species.(b) BUSCO assessments of the AJ8 gene.(c) Venn diagram showing the number of genes with homology or functional classification by each method.(d) The expression heatmap illustrates the expression levels among 21 RNA-seq samples.The color bar in the lower right corner represents log2-transformed FPKM values.Blue and red boxes indicate genes with lower and higher expression levels, respectively.

Fig. 4
Fig. 4 Phased subgenomes of the AJ8 genome.(a) The histogram of differential k-mers among homoeologous chromosome sets.(b) Heatmap and clustering of differential k-mers.The x-axis, differential k-mers; y-axis, chromosomes.The vertical color bar, each chromosome is assigned to which subgenome; the horizontal color bar, each k-mer is specific to which subgenome (blank for non-specific kmers).(c) Principal component analysis of differential k-mers.Points indicate chromosomes.

Table 1 .
Summary statistics of broomcorn millet genome assemblies.

Table 2 .
Summary of sequencing data of AJ8 genome.

Table 3 .
Summary of RNAseq sequencing data of AJ8 genome.

Table 4 .
Interspersed repeat contents in AJ8 genome assembly.Note: This statistical table does not contain Tandem Repeats, some elements may partly include another element domain.* Combined: the non-redundant consensus of all repeat prediction/classification methods employed.† Unclassified: the predicted repeats that cannot be classified by RepeatMasker.LINE, long interspersed nuclear elements; SINE, short interspersed nuclear elements; LTR, long terminal repeat.

Table 5 .
Number of functional annotations for predicted genes in AJ8 assembly.

Table 7 .
The identified telomeres in AJ8 assembly.

Table 8 .
The distribution of centromeres in AJ8 assembly.